Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scale Based Features for Audiovisual Speech Recognition

This paper demonstrates the use of nonlinear image decomposition, in the form of a sieve, applied to the task of audiovisual speech recognition of a database of the letters A–Z for ten talkers. A scale based feature vector is formed directly from the grayscale pixels of an image containing the talkers mouth on a per frame basis. This is independent of image amplitude and position information an...

متن کامل

mortality forecasting based on lee-carter model

over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...

15 صفحه اول

Audiovisual Phonologic-Feature-Based Recognition of Dysarthric Speech

Automatic dictation software with reasonably high word recognition accuracy is now widely available to the general public. Many people with gross motor impairment, including some people with cerebral palsy and closed head injuries, have not enjoyed the benefit of these advances, because their general motor impairment includes a component of dysarthria: reduced speech intelligibility caused by n...

متن کامل

epistemic modality in english and persian academic writing: a cross-linguistic study of genre on the notion of transfer

چکیده حیطه ی نوشتار دانشگاهی اخیرا شاهد تغییرات عمده ای از غیرشخصی بودن (عینی بودن) به شخصی بودن بوده است. شخصی بودن متون دانشگاهی اهمیت استفاده از وجهیت معرفتی را برجسته می سازد چرا که? وجهیت معرفتی? بر اساس یکی از تعاریف ارائه شده از این مقوله? ارتباط تنگاتنگی با شخصی بودن داشته و به عنوان بیان نظر شخصی گوینده در مورد جز گزاره ای گفته در نظر گرفته میشود. بنابراین? با در نظر داشتن نقاط مشترک...

15 صفحه اول

End-to-end Audiovisual Speech Recognition

Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-toend audiovisual model based on residual networks and Bidirectional Gated Recurrent Units (BGRUs). To the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Sciences

سال: 2020

ISSN: 2076-3417

DOI: 10.3390/app10207263